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Abstract 

Gravitational wave detectors now under construction are sensitive to the 
phase of the incident gravitational waves. Correspondingly, the signals from 
the different detectors can be combined, in the analysis, to simulate a single 
detector of greater amplitude and directional sensitivity: in short, aperture 
synthesis. Here we consider the problem of aperture synthesis in the special 
case of a search for a source whose waveform is known in detail: e.g., compact 
binary inspiral. We derive the likelihood function for joint output of several 
detectors as a function of the parameters that describe the signal and find 
the optimal matched filter for the detection of the known signal. Our results 
allow for the presence of noise that is correlated between the several detectors. 
While their derivation is specialized to the case of Gaussian noise we show that 
the results obtained are, in fact, appropriate in a well-defined, information- 
theoretic sense even when the noise is non-Gaussian in character. 

The analysis described here stands in distinction to "coincidence analy- 
ses", wherein the data from each of several detectors is studied in isolation 
to produce a list of candidate events, which are then compared to search for 
coincidences that might indicate common origin in a gravitational wave sig- 
nal. We compare these two analyses — optimal filtering and coincidence — 
in a series of numerical examples, showing that the optimal filtering analysis 
always yields a greater detection efficiency for given false alarm rate, even 
when the detector noise is strongly non-Gaussian. 
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I. INTRODUCTION 



Several large interferometric gravitational wave detectors [pj-|3|] will soon be operating, 
perhaps to be joined by several as or more sensitive acoustic resonant detectors [|]-|6| in 
the years to come. There are several reasons for wanting to combine, in some way the, 
the data from these detectors in order to draw inferences about the presence of a signal 
or the parameters that characterize it: for example, observations of a signal with several 
detectors increase the degree of confidence in the detection and characterization of a signal, 
and the use of several geographically separated detectors can allow one to disentangle source 
parameters (e.g., sky position, polarization) that are degenerate when observed in a single 
detector. 

The signal response of both interferometric and acoustic detectors is sensitive to the 
phase of the incident gravitational waves; consequently, we have the opportunity to combine 
interfere the response of several detectors, synthesizing a single, effectively larger and more 
directionally sensitive detector: in short, aperture synthesis. Here we describe a likelihood- 
based analysis for the detection of a signal of known, parameterized form in the joint output 
of a network of detectors. Such an analysis makes the most effective use of the information 
available from each detector, in exactly the same sense that optimal filter described in @-|9| 
is best-suited for detecting or characterizing radiation incident on a single detector. 

The analysis described here stands in contrast to a "coincidence" analysis, whereby 
"events" are identified independently in separate detectors and these independent lists of 
events are later brought together and compared for coincidences [|L0,|Il]]. The key difference 
between the two analyses is their sensitivity to inter-detector correlations. The response of 
a network of gravitational wave detectors to an incident plane wave is phase coherent. This 
phase coherence is captured in the likelihood function describing the output of a network of 
detectors; however, it is absent in a coincidence analysis. This leads the likelihood analysis 
to have an increased detection efficiency for a given false alarm rate when compared to a 
coincidence analysis searching for the same signals and based on the same detector output. 

We demonstrate this directly in a set of three numerical examples, based on a network 
consisting of two separated detectors. In each example we search for a signal of known 
waveform arriving at an arbitrary time and from an arbitrary direction. Two different 
search algorithms are investigated. The first is based on the maximum of the likelihood of 
the joint detector output; the second is based on the analysis of coincidences between the 
optimally filtered output of each detector, considered separately. The three examples differ 
in the character of the simulated noise: in the examples the noise is either Normal, strongly 
leptokurtic or strongly platykurtic. In all cases the likelihood analysis described here gives 
a substantially larger detection efficiency for a fixed false alarm rate. 

Most previous work on multi-detector gravitational wave data analysis has focused on 
coincidence analyses or variations on coincidence analyses [jiT)|-|T2"|] that occur separately at 
each detector. The work described here parallels recent, independent work by Bose and 
collaborators 0,|T1[. It goes beyond that work by allowing from the beginning detectors 



that are not coincident and accommodating the possibility that the noise between pairs of 
detectors may be correlated. 

Section [TT] introduces terminology and nomenclature used throughout this work. In sec- 
tion DTI] the likelihood function for the joint output of a set of gravitational wave detectors 
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is derived. The form derived here is exact when the detector noise is Normal; however, we 
show that it is also the best choice (in the sense of making the least additional assumptions 
about the noise statistics) when only the detector cross- and auto-correlations are known. 
In section fV| we describe and present the results of several numerical examples, meant to 
compare likelihood and coincidence-based approaches to data analysis. Finally, in section [V] 
we summarize our conclusions. 



II. NOMENCLATURE 

A. Continuous and discrete time signals 

While gravitational wave detectors are analog devices, the detector output, including 
diagnostic channels containing physical and environmental monitors, will be quantized in 
magnitude and sampled discretely in time. Data analysis will operate exclusively with this 
discrete time series. 

Properly handled, quantization of the continuous amplitude contributes a small, white, 
additive component to the detector noise |[|, Chapter 3.7.3], which we will not discuss 
further here. 

When certain other conditions hold, the continuous in time and discretely sampled time 
series are equivalent and fully interchangeable. In this paper we have occasion to refer to both 
discrete and continuous time representations of the same underlying process. To distinguish 
between these two representations, we refer to the continuous time representation of the 
process x by x(t), where t denotes continuous time, and the discrete time representation of 
the same process by x[k], where integer k denotes the sample number. 



B. Scalar-, vector- and matrix- valued functions and sequences 

For our purposes, the output of a single gravitational-wave detector is a real, scalar- 
valued time series. An important purpose of this paper is to discuss data analysis for a 
gravitational-wave receiver, by which we mean a logical collection of detectors. The output 
of a receiver is a vector-valued time series. The elements of the vector at any given sample 
are the detector outputs at the corresponding moment of time. 

In this paper we will have occasion to refer to scalar-, vector- and matrix-valued time 
series. To distinguish between these different cases, we use a lowercase italic face (as in a) 
to denote scalars or scalar-valued sequences, a lowercase bold-italic face (as in a) to denote 
vectors or vector sequences, and an uppercase bold-italic face (as in A) to denote matrices 
or matrix- valued sequences (functions). 



C. Discrete Fourier transform 

Many different conventions for the discrete Fourier transform (DFT) can be found in the 
literature. We adopt the conventions described in this section. If you compare the notation 
used here with other conventions found in the literature, it will be to your advantage to pay 
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careful attention to the normalization, index range of the input and output sequences, and 
sign convention of the transform kernel. 

Suppose that x[k] is a sequence of coefficients of length N, with initial index k . The 
DFT x[k] of the sequence x[k] is the periodic sequence given by 

ko+N-l 

= £ x \. k \ u N' ( 2 - la ) 
fc=fco 

where 

u N = exp(-2iri/N) (2.1b) 

is the A^ th root of unity. (Note our use of the engineering convention for the transformation 
kernel. ) 

The kernel of the transform, lu j n , satisfies an orthogonality relationship, 

N-1 

^ = iV E <"~N l ( 2 - 2 ) 

n=0 

as k and I run from to AT — 1; consequently, the DFT is invertible. The inverse DFT 
(IDFT) is given by 



*N = i£^K nfe - (2- 3 ) 



fc=0 



Note that the IDFT is also a periodic sequence. In particular, if x[k] is the DFT of a 
sequence x[j], then x[k] is the periodic extension of x[j]. 

A wealth of useful results and insights into the DFT/IDFT pair are found in [16|; addi- 
tionally, |L5| discusses the use of the DFT/IDFT pair in digital signal processing. 



D. Terminology 

A receiver is a collection of one or more individual gravitational-wave detectors whose 
output is analyzed collectively to draw a single set of inferences regarding the presence and 
properties of radiation sources. The elements of the receiver are the detectors, which need 
not be co-located, co-oriented, or of identical design, sensitivity or bandwidth. For example, 
a single receiver might consist of 

• the three LIGO interferometers (a 4 Km and a 2 Km interferometer in Washington 
State, USA, and a 4 Km interferometer interferometer in Louisiana, USA) 0, 

• the Virgo interferometer (located in Pisa, Italy 

• the GEO-600 interferometer (located in Hannover, Germany 0), 

• the TAMA-300 interferometer (located in Tokyo, Japan flT7|]), 
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the ALLEGRO detector (a resonant mass detector located in Louisiana JT8[ ) . 



the AURIGA detector (a resonant mass detector located in Padua fT9|l), 

the Explorer detector (a resonant mass detector located in Frascati 0]), and 



• the Nautilus detector (a resonant mass detector located in Geneva pT|). 

A gravitational-wave receiver is thus a logical grouping of gravitational-wave detectors. 

The output of detector a is the scalar-valued time-series g a . The output g of a receiver 
composed of several detectors a, . . . , ui is the direct sum of the output of the individual 
detectors: 



g = g a ®-"@g u - (2.4) 

The receiver response m to an incident gravitational-wave h depends on parameters 
that reflect both the intrinsic properties of the source and the relationship of the source 
to the receiver (i.e., distance and orientation). For example, if the source is a stochastic 
gravitational-wave signal, then describes the signal spectrum and anisotropy; alterna- 
tively, if the source is an inspiraling compact binary system, then 6 describes the binary's 
component masses, spins and orbital characteristics. In this paper I discuss analysis for de- 
terministic signals: i.e., signals like those that arise from binary inspiral or rapidly rotating 
non-axisymmetric neutron stars, whose waveform can be accurately described in terms of a 
small number of parameters that characterize the source and its orientation with respect to 
the detector. 



III. THE LIKELIHOOD FUNCTION 
A. Introduction 

Let H and H e denote alternative, exclusive hypotheses regarding the presence or absence 
of a signal: 

jj _ ( proposition that the receiver 
youtput consists only of noise 

(proposition that the receiver 
output consists of signal sq I (3.1b) 
superposed with receiver noise J 

where 6 represents a set of parameters that differentiate signals in the detector: for example, 
if the signal is from an inspiraling binary neutron star or black hole system, the parameters 
include the component masses and the source distance from and orientation with respect 
to the receiver. In this section we derive the likelihood function A(g\6), describing the 
probability that g is observed under the hypothesis H e . 

Why the likelihood function? Recall that the likelihood is any function that, viewed as 
a function of 0, is proportional to P(g\Hg), the probability of observing g given the fixed 



(3.1a) 
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hypothesis Hq. In a Bayesian analysis, the goal is to determine or characterize P(He\g), 
the probability of the hypothesis given the observation. Through Bayes Theorem, this 
quantity is directly related to the likelihood. In frequentist statistical analysis the goal is to 
determine the improbability of observing g under the alternative hypotheses that the signal 
is absent or that it is present, or to establish confidence intervals — a range of hypotheses 
He for which the observation has, in a certain sense, high probability [22j| . In either case 
the likelihood plays a central role and no other function of the observation and hypotheses 
offers the prospect of stronger statements. 

In our derivation of the likelihood we assume that the noise in each detector is Gaussian 
and stationary. While the fundamental noise sources in gravitational wave detectors are 
all characterized by Gaussian-stationary statistics, the realities of an actual implementation 
- e.g., detector imperfections and environmental couplings — guarantee that the actual 
noise character will be neither Gaussian nor stationary. Characterizing the detector, which 
includes identifying instrumental and environmental artifacts in the "gravity-wave channel" 
and regressing or removing these artifacts to the greatest possible extent, is a necessary 
pre-requisite to the analysis of the data for gravitational waves. We assume here that these 
artifacts have already been dealt with as best possible, so that g contains no identifiable 
instrumental artifacts, transient or otherwise, and that the noise is stationary on timescales 
long compared to the duration of a signal. 

Even so, the noise will remain non-Gaussian and non-stationary. As long as the evolution 
of the noise character is adiabatic — i.e., it varies only on timescales long compared to the 
signal duration and the time required to estimate the moments of the noise distribution - 
we can treat the noise as stationary on any important sub-interval. It is also an excellent 
approximation to treat the noise as Gaussian. To see why, focus on a stationary noise process 
but without any assumption on its distribution V(x). Suppose, by observations on the noise, 
we estimate its mean \i (x) and variance a 2 ((x — yu) 2 ). What probability distribution makes 
the least assumptions about the process while remaining consistent with this information? 
This information-theoretic question has a definite answer: it is the probability distribution 
P(x\I) with maximum entropy, subject to the constraints on its mean and variance: i.e., 
the distribution P(x\fi,cr) that maximizes the functional 



where Ao, Ai and A2 are the Lagrange multipliers corresponding to the constraints on the 
overall normalization of the probability, the mean and the variance. The solution to this 
variational problem is 



i.e., the normal distribution. Similarly, if we have a correlated process known to us only 
through its mean and autocorrelation, 




(3.2) 




exp [(x - /i) 2 /2a 2 ] 



(3.3) 




(3.4) 
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the distribution that makes the least assumptions about the process beyond these is the 
multivariate normal distribution 



cxp 



P ({x[j}} C) = 1 _ 1 (3.5) 

/ (27r) ArT det||C|| 

where is a set of N? samples indexed by the sample times and the matrix C has 

components 

\\C\\ jk :=C[k-j]. (3.6) 

Thus, modeling (correlated) noise as arising from a (multi-variate) Normal distribution is 
simultaneously the best and most conservative assumption that one can make when all one 
knows is the noise mean and (co-) variance. 

Put another way, if our only knowledge of the noise character is its first and second 
moments (i.e., mean and correlation function or power spectrum) then treating the noise as 
anything but Normally distributed is to assume more than we actually know and, conse- 
quently, would lead us to overstate the accuracy of any conclusions we reach. 

We choose a particular normalization of the likelihood function, as the ratio of P(g\0) to 
the probability that g arises from detector noise alone. The signal is assumed to be a plane 
gravitational wave incident on the detector array, so that the detectors respond to the signal 
coherently. If the hypothesized signal is characterized by parameters 6 (e.g., binary system 
chirp mass, orientation, distance, etc.) then we denote the likelihood function A(g\0) and 
regard it, for fixed receiver output g, as a function of the signal parameterization 0. 

The derivation of A(g\0) for a deterministic signal in a multi-detector receiver parallels 
closely the derivation of A(g\0) given in |7], §11] for a single-detector receiver; however, in 
the generalization several new elements arise and are discussed here for the first time. 

In section flfB we walk through the construction of P(g\Ho,T), the probability of ob- 



serving g in the absence of any signal. Evaluation of P(g\He, X) (the probability of observing 
g when the signal characterized by 6 is present) and the likelihood function itself proceed 
much more quickly in section [111 C| . In section |111D| we describe a detection test based solely 
on the maximum of the likelihood and identify the signal-to-noise ratio with the maximum 
of In A over 6. In section 1111 El we discuss how the likelihood function can be evaluated 



quickly and efficiently, which is critical to its use in real data analysis. 



B. Probability that g is receiver noise 

Central to the evaluation of A(g\6) is the evaluation of P(g\Ho,I), the probability that 
the receiver output g is an example of receiver noise n. (Here X denotes other, unenumerated 
conditions.) The formulation of this probability density for a single detector (scalar- valued 
time series g) was discussed in 0; in this section we review that discussion, setting the stage 
for our treatment of the more general problem of a multi-channel receiver (vector- valued g). 
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1. Single- channel time series 



Focus attention first on the single-channel output of a single detector. When the detec- 
tor noise is Gaussian and stationary, any single sample n[j] of detector noise is normally 
distributed with a mean and variance independent of when the sample was taken. With- 
out loss of generality we can assume that the ensemble mean n vanishes, in which case the 
probability that a sample g[j] of detector output is a sample of detector noise is 

PteM|g.,I) = °P'-£MV 2g 1 . (3.7a) 

The variance a 2 of the distribution is the ensemble average of the square of the detector 
noise: 

a 2 = tf. (3.7b) 



Equation |3.7a| holds true for each sample g[j}; consequently, the joint probability that 



the length sequence g is a sample of detector noise is given by the multivariate Gaussian 
distribution 



exp 



-iY$&gV\\\c-% k g[k] 



P(g\H ,l) = L - (3.8) 

y / (27r) JVT det||C|| 

In place of the variance a 2 that appears in the exponent of equation |3.7a| is the matrix C -1 . 



As P(g\H ,X) is a probability so C -1 is positive definite and invertible. The matrix C gives 
the covariance of the noise process: it is equal to the ensemble average of the product of the 
detector noise at different samples: 



n[j]n[k] = \\C\\ jk . (3.9) 

Since the detector noise is also assumed to be stationary, ||C|| jfc can depend only on the 
difference j — k; correspondingly, C is constant on its diagonals: i.e., it is a Toeplitz matrix. 
Since C is Toeplitz it is characterized completely by the scalar sequence c[k] of length 2Nt — 1 
whose elements are the first row and column of C: 

c[j~k] := \ \C\\ jk . (3.10) 

This sequence is the noise autocorrelation. The DFT of c[k] is related to the noise power 
spectral density. 



2. Multi- channel time series 

Now consider a receiver consisting of Np component detectors, where the receiver output 
is an iV^-dimensional time series. Without loss of generality assume that, while the different 
detectors that comprise the receiver may be sampled at different rates, all the detector 
outputs have been resampled so that the interval between samples in all detector data 
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streams is At. The receiver output g is a multi-channel time series: a vector-valued sequence 
consisting of the direct sum of the output g a of the individual detectors. Similarly, the 
receiver noise n is the direct sum of the noise contributions n a to the output of the Nd 
detectors: 

g[k] := gi[k] ® ■ ■ ■ ® g No [k] (3.11a) 
n[k] :=n l [k]®---®n ND {k}. (3.11b) 

Focus attention on a single sample of the receiver output: i.e., the vector g[j]. Since the 
receiver noise is Gaussian and stationary, the probability that sample g\j\ is a sample of 
receiver noise n is given by the multivariate Gaussian 



Pm\**M = eXPH9ljKrl,gLj (3.12a) 

1 (2h) Nd det\\C[ 



where the matrix C[0] is the ensemble average of the outer product of a sample of receiver 
noise with itself: 



C[0] = n[j]®n\j]. (3.12b) 

As suggested by the selection index [0], C[0] is one element of a sequence of correlation 
matrices. Each element of this sequences arises from the ensemble average of the outer 
product of receiver noise with itself at a different time. Since the receiver noise is stationary, 
the ensemble average depends only on the time difference; consequently, 



C[j — k) — n[j] <g> n[k). (3.13) 

Just as the DFT of the autocorrelation sequence for a single detector is related to its noise 
power spectral density, so the DFT of C[k] is related to its noise power spectral density. 
In this case, the power spectral density is a matrix-valued function of frequency, with the 
diagonal components equal to the power spectral density of the noise in a particular detector 
and the off-diagonal components related to the cross-spectral density of the noise in two 
different detectors. 

Equations |3. 12| hold true for each individual sample of receiver output g[j}; consequently, 
the joint probability that the length Nt sequence g is a sample of receiver noise is also a 
multivariate Gaussian: 



cxp 



iT^g\J]'\\^ T {C)-%k'9[k] 



P(g\H ,l) = L - = L , (3.14a) 



{2ti) NtNd D Nt (C) 

where Tn t (C) is a block Toeplitz matrix, i.e., a matrix whose elements 11X^(0)11 - fe are 
themselves Toeplitz matrices: 

\\T NT (C)\\ jk = C[j-k}. (3.14b) 

Each "element" of T Nt (C) is thus a N D x N D matrix. The inverse T A r T (C) _1 of T Nt (C) is 
defined in the usual way 
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AT T -1 

Wn,, = E ipwcoii,-,- H'Wcr'lL (3.i4c) 



1=0 

N T -1 



£ cf i>'-fl-ll T ^( c )" l IL' ( 3 - 14d ) 



«=0 



where • indicates the usual matrix product between Np x No matrices and In d is the 
Nd x Nb unity matrix. 
Finally, 

D Nt (C) = det 1V T (C); (3.14e) 

i.e, Dn t (C) is the determinant of Tn t (C), regarded as an NtNz> x NtNd matrix. 

The argument of the exponential in equation |3.14a| takes the general form of a symmetric 



inner product of two vector- valued sequences with respect to T^ T (C) _1 . Terms like this 
occur frequently enough that it is convenient to introduce a special notation for them: in 
particular, we define the symmetric inner product 



N T -1 

<a,b> c :=- J] [a[j]-||T iVr (C')- 1 || j . fc -6[A ; ] + 6[j]-||T WT (C)- 1 || jfc -a[A ; ]] (3.15) 

j,k=0 



so that 



P(g\0,l)= ^-<^a>c] (316) 
(2tt) NtNd D Nt (C) 



C. The likelihood function for a deterministic signal 

Consider the case of a deterministic signal, i.e., a signal whose time series is known up to 
parameters 0. An example is the gravitational-wave signal from a coalescing binary system, 
where the parameters include the binary's position and orientation on the sky, component 
masses and spins, and distance. 

Distinguish between the signal itself, which we denote h(6), and the receiver response 
to the signal, which we denote m(6). Assume that the receiver response is linear in the 
applied signal; consequently, the probability of observing g when the signal h(0) is present 
on the receiver is the same as the probability of observing g — m(6) as a sample of receiver 
noise: 



P(g\H ,l) =P{g- m(0)\H o ,l) (3.17a) 

(3.17b) 



exp [- (g - m(0),g - m{0)) c } 



\2tt) NdNt D Nt {C) 

The likelihood function is thus 



10 



A(gr|0) = exp [2 (g, m(0)) c - (m(0), m(0)) c ] , 



(3.18) 



where we have exploited the symmetry of the inner product <> (cf. eq. |3.15| ). 

The sole influence of the signal on the likelihood owes to the first term in the argument 
of the exponential in equation |3.18| , (g, m(0)) c . This term is a linear filter applied to the 



observation g. That linear filter is the Wiener optimal filter for a gravitational wave receiver. 

D. The maximum likelihood test 

In a Bayesian analysis, the product of the likelihood and the a priori probability density 
P(0\T) is proportional to the a posteriori probability P(0\g). From this probability density 
one can decide with what confidence one believes that a signal has been observed or construct 
Bayesian credible sets — regions of parameter space encompassing a given fraction of the 
total probability that takes on a given value. 

In a frequentist analysis confidence intervals and upper limits are constructed from the 
likelihood, although certain ad hoc assumptions are required to make the procedure definite 
P^~P3|. When our goal is simply to decide whether a signal is present a common candi- 
date procedure, recommended both by analysis p6| and intuitive appeal, is the maximum 
likelihood test: Let 

= ( Domain of physically permissible ) , (3.19a) 
Amax(<?) =maxA((?|0), (3.19b) 

~, N / Parameter for which \ /n _ x 

= ( A(g\8) = A m „( S ). ) • < 3 ' 19C > 

and choose a threshold A . Given an observation g, evaluate A max (g). If A max exceeds A , 
then conclude that the signal corresponding to 0(g) has been observed. 

If is not on the boundary of 0, then the maximum value of the likelihood function is 
also an extremum of A(g\0) and 

lnA(g\0) =< m d ,m d > . (3.20) 



E. Evaluation of < a, b >c 



A naive evaluation of < a,b >c following its definition in equation |3.15| has a high 
computational cost: 

1. Solving the linear system Tn t (C) ■ x = b requires 0[(NtNd) 3 ] operations; 

2. Evaluating the inner product a ■ x requires O(NtNd) operations. 

The operation count for this evaluation of < a, b >c is dominated by the solution of the 
linear system in the first step and would appear to be prohibitive, even if if could be done 
accurately, for all but the shortest time series. 
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In fact < a,b >c can be evaluated in at most O(N^Nt) operations. To do so requires 
only that we pre-process the input data through a chain of between one and three linear 
filters. In this section we describe these filters and the inner-product of the filtered time- 
series. 



1. The Linear Filter 



The desired pre-processing is conveniently described as a sequence of three linear fil- 
ters. The first filter simply whitens separately the output of each detector in the receiver. 
The second forms linear combinations of the whitened detector outputs to form a basis of 
"pseudo-detectors" whose cross-correlations vanishes, while the third whitens the pseudo- 
detector output. Thus, the first filter can be formed and applied without reference to the 
other detectors, while the second and third filters are the identity if the cross-correlation 
between distinct detectors vanishes. 

a. First filter: Whiten detector noise Using any convenient technique |27| whiten the 



output of each detector. Denote with a prime the whitened detector outputs and their cross- 



correlation when regarded as part of a receiver: i.e., 
k and 



n, 



is the whitened noise from detector 



C'[j - k] = n'[j] ®n'[k}. 
Write the cross-spectral density Toeplitz matrix C in block-form as 



(3.21) 



a 



where 



/ C 
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c\ 



12 



C 1N D \ 



°21 °22 ' • • 27Vd 



\ C N D 1 C N d 2 ■ ■ ■ C> N D N D J 



(3.22a) 



C'[l-m]=n>[l]n> k [m} 



(3.22b) 



Focus on the diagonal blocks C' kk , which are the Toeplitz matrices corresponding to the 
autocorrelation of the output of detector k. These blocks are constant multiples of the 
unity matrix, the constant being simply the mean-square noise amplitude in detector k, a\. 
Absorb this constant into the whitening filter for each detector so that the whitened output 
has mean-square amplitude unity and the C' kk are just the unity matrix. 

Focus attention now on cross-correlations, which are represented by the off-diagonal 
Toeplitz matrices in equation |3.22aj . If the cross-correlation between the detector outputs 
is consistent with zero then we are done with the pre-processing. If, on the other hand, 
C'[j — k] is non-zero for j ^ k, then we have two additional pre-processing steps, which we 
describe below. 
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b. Second filter: Diagonalization The vector g' corresponds to the direct sum of the 
output of the several detectors that form the network, after their output has been separately 
whitened. In this step we we form a new "basis" of detectors whose noise is uncorrelated: 
i.e., we find the linear filter described by the coefficients b[k] and a[k] such that n", 

N b N a 

n"[k] := -n'[k- j] - ^ a\j] ■ n"[k - j], (3.23) 

3=0 j=l 

has the property 



n'![l] ® <[m] = C'![l - m]5 jk . (3.24) 

This transformation, applied to g', yields g". 

Standard system identification techniques |27j can be used to find the appropriate trans- 



formation. For purposes of illustration only we describe one way to find such a transforma- 
tion, involving just sequence b. Focus on the discrete Fourier transform of C"[fc], 



A' 



C'[k] = C 'IMn-i, (3-25) 



n=-N 



where we have chosen N large enough that C'[k] vanishes for k > N. The quantity C is 
the two-sided (discrete) cross-spectral density of n'. Each component C'[k] is an x 
Hermitian matrix. Consider the sequence of unitary transformations U[k] such that 

U[k)* ■ C'[k] ■ U[k] (3.26) 

are each diagonal. This sequence exists as long as the noise is not fully correlated in any 
pair of detectors, at any frequency. Additionally, the symmetries of C guarantee that U[k] 

equals U [—k]. Consequently, the inverse discrete Fourier transform of the sequence U[k] 
yields a real linear filter U[k] that, when applied to the vector-valued receiver noise n', 
yields an output n" whose cross-correlation is diagonal: i.e., 



n'![l]n'l[m] = if j ^ k (3.27a) 

where 

N 

n"[k]= U\j]-n'[k-j}. (3.27b) 

j=-N 

c. Third filter: Final whitening Following the formation of a pseudo-detector basis the 
n'j will not necessarily be white. The final step of pre-processing is to whiten separately the 
output sequence corresponding to each pseudo-detector, absorbing the overall normalization 
a" 1 into the filter so that the rms output of each pseudo-detector is unity. This final step, 
since it does not involve combining the output of the different pseudo-detectors, does not 
change the vanishing cross-correlation of the output of different pseudo-detectors. 
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Following this final pre-processing we are left with the receiver output g'" whose noise 
component has the desired property 

n'j'[l] ® n'»[m] = 8 jk 5 lm ; (3.28) 

i.e., the noise in each pseudo-detector is white, and the noise in different pseudo-detectors 
is uncorrelated. 

Since all of the operations involved in this pre-processing are linear filter operations the 
computational cost of processing a length N T sequence of network output g to g" is strictly 
proportional to Nt- The order of the filters involved is independent of Nt- The determina- 
tion of the filters themselves may be a somewhat time-consuming operation; however, since 
the receiver noise is stationary on timescales long compared to the signal duration these 
filters need to be found or modified very infrequently.^] The first and (if needed) third linear 
filter operations are done separately for each detector or pseudo-detector; correspondingly, 
they involve a factor of Np. Finally, the second transformation, which (if needed) forms 
the basis of independent detectors, is really O(Np) linear filters. Hence, evaluation of the 
linear filter on a sequence of length Nt requires O(NqNt) operations generally, falling to 
O(NdNt) if there is no cross-correlated noise between the detectors. 



2. Reformulating the inner product 



We presume that we have pre-processed the receiver output as described in the sec. [Ill E 1 
Following this preprocessing the inner product can be rewritten as 

Nt — 1 



<a,b> c :=±J2 [a[j]-\\T N AC)- 1 \\ jk -b[k] + b[j].\\T NT (C)- 1 \\ ik -a[k]\ (3.29) 

j,k=0 
N T -1 

= - £ o!'\j] ■ b"\j], (3.30) 

j,k=0 

which has an operation count of O(NdNt)- Including the pre-processing, the operation 
count scales linear with Nt and either linearly or, at most, quadratically with the number 
of detectors in the receiver No- 

Most of the work involved in calculating the inner product is in dealing with the correla- 
tions: both auto-correlations of the individual detector outputs and cross-correlations of the 
different detectors in the receiver: i.e., in "inverting" T^ T (C). Owing to the special struc- 
ture of T Nt (C) the effect of its inverse in the inner product can be expressed by applying 
a sequence of linear filters, each of order independent of Nt, to the inputs (i.e., the linear 
transformations described in sec. |T UE 1| ). These transformations are determined entirely by 
the statistical character of the noise in the receiver, which changes only on timescales long 
compared to the signal duration; consequently, the asymptotic operation count for the inner 
product is at most 0(NtN]j) and not 0[(N T N D ) 3 } as a naive estimate might suggest. 



Were this not true then we would fail also to satisfy the basic assumptions that allow us to 
construct the optimal filter for a single detector. 
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F. Signal-to-noise 



Following the identification of pseudo-detectors whose noise is white and uncorrected 
the inner product < mg, mg >, which is the maximum of the log-likelihood, is recognized 
as the half the sum, over the pseudo-detectors, of the ratio of two quantities: the mean- 
square response of the receiver to the signal and the mean-square pseudo-detector noise. 
Correspondingly, we identify 

p 2 :=< mg, mg > (3.31) 

as the (power, or amplitude-squared) signal-to-noise ratio. 

When the detector noise is uncorrelated the signal-to-noise of the network is clearly 
related to the sum of the signal-to-noise of the component detectors: related — not equal 
- because in the analysis of several detectors as a single receiver we have imposed the 
important constraint that the signal parameters appearing in the separate signal-to-noise 
ratios are identical. To see the importance of this constraint, consider No detectors making 
an observation for a signal s(t\9). Assume for simplicity that everything about the signal 
(including its waveform and start time) is known except for its amplitude 9, so that 

a(t\9) := Os(t). (3.32) 

Suppose also that the noise in each detector is uncorrelated with the noise in any other 
detector. Since the detectors are independent in this way we can write the receiver likelihood 
A(g\0) as a product of the separate detector likelihoods Aj(gj\9). 

Now consider detector j acting independently of the rest. For this detector, 

In Aj-G&lfl) = 29 ( 9j , Sj ) Cj - 9 2 ( Sj , Sj ) c . , (3.33) 

where the subscript j indicates the relevant quantity with regards to detector j. The maxi- 
mum likelihood point-estimate of 9 based on the observation gj in detector j is 

9 3 = ^fi- (3-34) 



3/Ci 



and the corresponding S/N is 



20 2 { Sj , Sj ) Cj (3.35a) 
2 ( J^k. (3.35b) 



J/Ci 



In general, 9j will not be equal to 9k and pj will not be equal to pk- If gj is (Gaussian- 
stationary) receiver noise, than the ensemble average of each p 2 is equal to unity. The 
maximum of the log of the product of the likelihoods for the separate detectors is then half 
the sum over the p 2 , or N D /2. 

On the other hand, if we consider all the detectors to be part of a network, then 
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In A(g\0) = 29 (g, s) c - 9 2 (s, s) c . (3.36) 
The maximum likelihood point-estimate of 9 based on the observation g is 

6 - life p- 37 ' 

and the corresponding S/N is 

p 2 = 29 2 (s,s) c (3.38a) 



2 



= 2 i^4^ (3.3.S1,) 

= 2 E ^\^ (3.38c) 

The S/N p 2 for the network is not equal to the S/N in any component detector, nor is it 
equal to the sum of the p 2 treated as independent quantities. With the constraint that the 
individual detectors in the receiver must respond coherently to any signal, the ensemble 
average of p 2 is equal to unity, not Np. The constraint that each detector in the receiver 
responds coherently to the incident signal reduces the variance in p 2 below what one would 
expect from a simple sum of the individual p 2 . 



IV. EXAMPLE APPLICATIONS 
A. Introduction 

In this section we apply the formalism developed in the previous two sections in several 
numerical examples, which are based on a model source detected by a model receiver. Our 
aim is to illustrate one way that likelihood-based detection might be used in a network analy- 
sis, to demonstrate that its performance is superior to the kind of coincidence based analysis 
described by [[[0 -12], and to explore the performance of the likelihood and coincidence tests 



when the detector noise is strongly non-Gaussian. 

The model source and detectors are described in section [IV B| . For simplicity we focus 



on testing the null hypothesis H (i.e., "the signal is absent"). In section [IV U\ we describe 
two different ways of testing this hypothesis: via a threshold placed on the maximum of the 
likelihood function for the joint output of all the detectors and via an analysis of "coinci- 
dences" between events identified separately in each detector. Monte Carlo simulations are 
used to evaluate the detection efficiency as a function of the false alarm fraction for each 
test in two different circumstances: Gaussian detector noise and a mixture-Gaussian model 
of non-Gaussian detector noise. 

B. Model receiver and source model 

Consider two detectors, denoted " + " and " — " , separated by a distance 2R. For the 
purpose of illustration, assume that the noise in each detector is white (i.e., uncorrelated) up 
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to the Nyquist frequency, uncorrelated between the two detectors, and has two-sided power 
spectral density amplitudes S + and S_ in the + and — detectors, respectively. Assume 
also that these detectors have no orientation: they respond identically to radiation incident 
from any direction. 

In addition to parameters that describe the internal state of a radiation source and its 
orientation relative to the detector line of sight, every signal incident on the receiver is 
characterized by a "signal start time", describing when the initial wavefront reaches the 
receiver. It is convenient to measure time at the mid-point between the two detectors in our 
receiver, so that the signal start time is defined to be the moment that the initial wavefront 
reaches the mid-point between the two detectors. 

Signals are also characterized by their incident direction relative to the receiver. Since the 
two detectors have an isotropic antenna pattern, the receiver response to radiation incident 
from different directions depends only on the angle between the axis defined by the two 
detectors and the radiation's propagation direction. Figure |T] shows the geometry we use to 
describe the interaction of the model receiver with an incident gravitational wave, with the 
cosine of this angle by X . 

Again for the purpose of illustration, consider a model astrophysical burst source pop- 
ulation whose members have a well-determined waveform of finite duration. Assume the 
sources are standard candles and radiate isotropically, with radiation waveform two cycles 
of a sine wave of known frequency /q. Denoting the signal arrival time at the midpoint be- 
tween the detectors by T and the signal amplitude as A , the response of the two detectors 
to the signal is 

g = f A sin 27r/o [t - (T ± RX )] if < /„ [t - (T ± RX )] < 2 
1 otherwise. 

where we have dropped the distinction between the signal and the receiver response to the 
signal and assumed that the detector bandwidth is much greater than the signal bandwidth. 

Since /o is known, the receiver response to a signal is fully characterized by A , T and 
Xq. Acting alone, the ± detector can measure only A Q and T±, where 

T± := T ± RX . (4.2) 



C. Likelihood function 

Recall that the receiver's correlation sequence is 

C[j —k] = n\j]®n[k], (4.3) 

where, for our two detector receiver, each C[k] is a 2 x 2 matrix. We have assumed that 
the noise in detector + is uncorrelated with that in detector — ; consequently, each C[k] is 
diagonal and the matrix Tn t (C) is conveniently re-organized into a 2 x 2 block diagonal 
matrix: 

||T Nt (C)|| = diag(C+,C_) = ( °+ ° V (4.4a) 
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where C± is the correlation matrix for the ± detector, 

\\C±\\ Jk = n±\j]n±[k]. (4.4b) 
Expressed in this way, it is apparent that the likelihood function is separable: 

A(g\A,X,T ) = A(g + \A,T + )A(g_\A,T„) (4.5a) 

where 

lnA(g±\A,T±) = 2 < g ± ,m(A,T±) > c± - < m(A,T ± ,m(A,T ± ) > c± (4.5b) 

The A(g±\A,T±) are exactly the likelihood functions for the ± detectors regarded as single, 
isolated receivers. This separation is always possible when the noise in the detectors is 
uncorrelated. When the noise in the detectors is correlated the likelihood is still separable 
once the pseudo-detectors are defined through as described in sec. [111E 1| . 

D. Signal detection 

In this section we consider, in the context of our model receiver, two different ways one 
might use a pair of detectors to detect a signal and infer its parameters. One procedure 
exploits the notion of coincidence: if the two detectors identify separately a signal with 
sufficiently similar parameters then the receiver is said to have detected a signal. The other 



procedure exploits the notion of correlation as developed in section |IJ: if the response of 
an array of detectors is consistent with an incident plane-wave, then the receiver is said to 
have detected a signal. For each of these two tests we determine the detection efficiency as a 
function of the false alarm error fraction when the detector noise is Gaussian. We find that 
the test based on correlation, as embodied in the receiver likelihood function, has a greater 
detection efficiency than the coincidence test for any choice of false alarm fraction. 



1. Maximum Likelihood Inference 

The likelihood function A(g\6) is the dimensionless ratio of two probabilities: the prob- 
ability of making the observation g if the signal 6 is present and the probability of making 
the observation g if no signal is present. It is not a probability itself, nor by itself does it 
relate directly to a probability on 0. 

Even though we can't regard the likelihood as a measure of the probability that a signal 
characterized by is present, we can regard it as a measure of the plausibility of that 
conclusion: when A(g\0) is greater than unity it signals that the particular observation g 
is more likely when the signal characterized by is present than when no signal is present. 
Similarly, if we assume that a signal is present, then the parameter 6 that maximizes the 
likelihood function is the most plausible description of the signal. 

Together, these observations motivate a test based on the likelihood function: when the 
likelihood function maximum A(g\0) exceeds a threshold A , then we conclude that a signal 
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is present and take 6 to be the maximum likelihood estimator, or MLE, of the detected 
signal. 

To be precise, consider an observation g, whose N samples g[k] are taken at time tk- 
Denote by s(6)[j] the sampled receiver response to a signal characterized by 6. Assume that 
the observation duration is much longer than the longest signal response (so that we need 
not consider signals that begin before or end after the observation period). Denote by 6 k 
the parameter space of signals whose leading wavefront is incident on the receiver at time t^. 
in our example, these are just A and X. Finally, fix a threshold pp. The following procedure 
produces a list of detected signals and point estimates of the parameters describing each: 

1. Evaluate the log-likelihood function \nA(g\Ok) for signals incident on the receiver at 
the sample times tk- 

2. At each sample time tk, find the signal characterization Ok that maximizes \nA(g\6k). 
Associate with each tk and 6k a S/N pj., given by 

pt = \nA(g\6k). (4.6) 



Order the triplets {pk, 6k, tk} with respect to tk. Select the subset of triplets where i) 
Pk is greater than the threshold po and ii) a local maxima; i.e., find the {pw, 6k'} for 
which 

Po < p±,k' (4.7a) 
P±,k>-i < P±,k' (4-7b) 
P±,k'+i < P±,k>- (4.7c) 



4. Beginning with the largest pk in this subset find all other triplets {pV, 6y, ty} for 
which \tk — tk'\ is less than the signal duration. Discard these tripletts. Repeat with 
the next largest remaining p^ until the list is exhausted. What remains is the list of 
detected signals, with S/N p k , signal start time t k , and characterized by (the point 
estimate) 6k- 

Three steps in the maximum likelihood test procedure deserve additional discussion: the 
focus only on bursts starting at the discrete sample times tk (step |2|), the formation of the 
intermediate list of consisting of local maxima of the maximum of the likelihood function 
(step |3|) and the pruning of this list to form the final list of detected signals (step [|). 

Step focuses attention on signals arriving at the discrete sample times. Real signals, 
however, are not so constrained. Nevertheless, if the observation is properly sampled (i.e., 
sampled without aliasing), then all of the power in the receiver response is at frequencies 
much less than the Nyquist frequency f^, which is half the sample rate. In that case 
{m{6, to), m{6, t + r)) c , where m(6,t) denotes the receiver response to a signal whose 
initial wavefront arrives at the network at time t, cannot vary significantly for \t\ less than 
several times 1//^; correspondingly, the likelihood will remain peaked about the tk nearest 
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to the actual signal arrival time and the corresponding signal to noise will differ only slightly 
from its maximum value.0 

In step [|, we select only the local maxima of the likelihood function as candidate signal 
events. This reflects the observation that, in the absence of noise, the likelihood function is 
maximized when is equal to the true signal characterization 6 t . 

Even in the absence of noise, however, not all local maxima can be identified as dis- 
tinct signals. While the likelihood function is maximized when 6 is equal to the true signal 
characterization 6 t , as 6 differs from 6 t the likelihood decreases, but not necessarily mono- 
tonically. Even for our simple signal model there are three local maxima associated with 
the likelihood function. The situation is further complicated when, as is the actual case, 
receiver noise distorts the "noise-free" likelihood, randomly increasing it for some 6k and 
decreasing it for others. 

To help distinguish between the global maximum of the likelihood function and its side- 
lobes, we make use of our implicit assumption that real signals are sufficiently rare that the 
receiver response to one real signal does not have a significant probability of overlapping 
with its response to a second real signal. Any two local maxima separated in time by less 
than the signal duration are then associated with a single source. In step |] we prune the 
list of candidate signals (i.e., the local maxima identified in step ^ by identifying clusters 
of local maxima and replacing each with its single, strongest member.^ The result is a 
list of events, all above threshold, in which no two events can have resulted from the same 
gravitational-wave signal. 

Finally, we justify the use of 6 as the point estimate of the signal parameters charac- 
terizing the detected signal. Suppose that a signal characterized by fixed t is incident on 
an ensemble of identical receivers. The corresponding ensemble of 6 has as its mode t ; 
consequently, a natural estimator for 6 t is the 6 arising from a particular observation. In 
the maximum likelihood rule, when we conclude that a signal is present we take as our 
point-estimate of the signal parameters. 



2. Coincidence inference 

Much discussed in the context of gravitational- wave data analysis is an apparently simpler 
analysis, referred to generally as "coincidence." This test has received its most precise 



definition in fllOlJlTl for the particular case of binary inspiral observations. 

In general, coincidence tests involve a complete analysis at each individual detector, 
considered in isolation from all other detectors in the receiver. The result of these indi- 
vidual analyses is a set of "candidate-event lists", one for each detector, which consist of 
"detections" at each detector together with estimators for the signal start time and other 



2 The errors incurred here can be reduced still further by appropriate interpolation. 

3 This intrinsically non-linear step introduces into the analysis a notion of detector "dead time:" 
i.e., the analysis is unable to identify more than a single signal in any given interval of duration 
less than the signal duration. 
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signal parameters that can be determined from observations in a single detector. Real 
gravitational-wave events should excite the several different detectors in a self-consistent 
manner: in particular, the signal start times should be consistent with the light travel time 
between the detectors and other signal parameters should be consistent with a unique source. 

The consistency requirement is difficult to pin-down. For example, in the case of our 
own model detector and source, consistency would appear to require that the signal arrival 
times are consistent with the signal propagation time between the detectors and that the 
measured signal amplitudes be equal. Owing to detector noise, however, the estimated 
signal amplitudes will only approximate the actual amplitudes, and similarly for the signal 
start times and other parameters. For signal amplitudes, then, a window of some breadth 
must be defined and signal candidates whose amplitudes fall within the window are assumed 
to arise from a real signal. The choice of window, its implementation and the procedure 
for combining separate estimates of common parameters all affect the false alarm and false 
dismissal fractions that characterize the test. 

The problem is more complicated in the case of an estimated source location. Consider 
a real signal, incident on the detectors from a direction nearly perpendicular to the axis 
between them. Let the measured start time T± on the ± detector be 

T± = T 0t ± + e±, (4.8) 

where To,± is the actual moment when the signal is incident on the ± detector and e± is the 
error in the estimated start time owing to detector noise. The difference in the measured 
signal start times is 

T+ - T_ = T 0;+ - T _ + e+ - e_ 

= 2RX Q + Ae, (4.9) 

where X is the direction cosine describing the radiation's propagation direction and Ae is 
the difference between the errors in the measured start times. When a real signal is incident 
along, or nearly along, the axis {\X \ ~ 1), then small errors |Ae| <C 2R can lead to \X \ > 1, 
which we would regard as unphysical and not representative of a real signal. On the other 
hand, when \Xq\ is much less than unity, the same errors will leave us with \Xq\ still much 
less than unity, in which case we accept the coincidence as representing a real signal. To 
minimize this false rejection of real signals we can adopt a window broader than the light 
travel time between the detectors for comparing signal arrival times (taking the estimated 
arrival direction to be along the axis if the difference of arrival times would suggest \X\ 
greater than unity); however, in doing so we also increase the false alarm rate, reducing the 
discriminating power of the test. 

The sign of the error also plays an important role: when Ae has the same sign as X 
we are more likely to reject a real signal than when they have opposite signs. The fraction 
of signals rejected can thus depend in a complicated way on the interaction between the 
underlying signal parameters, the windows, and the allowable range of the parameters that 
characterize the signal. 



In the spirit of [10,11 1 we define a coincidence inference procedure for our model receiver: 



1. For each detector considered in isolation, determine the two sets of candidate signals 
associated with detector + and — : 
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(a) Evaluate the log-likelihood function ka.A(g±\d± t k) for signals incident on detector 
± at the sample times t±^. 

(b) At each sample time t± j k, find the signal characterization 0± k that maximizes 
In A(<7±|0± ; fc). Associate with each t±$ and 0± k a S/N p±,k, given by 

^ )fc = lnA(^ ± |? ±>fc ). (4.10) 

The result is a set of associated signal-to- noise, parameterizations and signal start 
times {p±,k,0±,k,t±,k}. 

(c) For the list associated with each detector, select the subset {p±,k'i 0±,fc', t±,k'} f° r 
which 

P±,o < P±,k> (4.11a) 

P±,k'-i < P±,k' (4.11b) 

P±,k'+i < P±,k> (4.11c) 

where p± ; o is the signal detection threshold in detector ±. 

(d) Beginning with the largest p± 5 fc in the list of local maxima associated with detector 
±, find all other p±,fe' for which \t k — tk>\ is less than the signal duration. Discard 
these. Repeat with the next largest remaining p± >k until the list is exhausted. 
What remains is the list of candidate signals, with S/N p± ;0 , signal start time 
t± t k, and characterized by (the point estimate) 6± tk - 

2. Choose the candidate signal list associated with detector +. Beginning with the can- 
didate signal of largest S/N p +jfc in that list, process that list in order of decreasing 
p +t k to create a new, coincident detection event list: 

(a) Let the current candidate event from the list associated with detector + be num- 
bered k + . 

(b) Identify, from the candidate event list associated with detector — , all candidates 
whose start times tj_ in detector — are consistent with the candidate signal arrival 
time tk + in detector +: i.e., 

\t k+ - tj_\ < 2R. (4.12) 

Impose other consistency requirements, associated with + ^ + and 0-j_, as are 
deemed appropriate. (In our model receiver /source example we do not impose 
any other consistency requirements.) 

(c) The result is a list of candidate coincident events in detector — associated with 
the event k + in detector +. The list may contain zero, one or more than one 
event: 

i. If it contains no events, delete event k + from the list of candidate events 
associated with detector +. 
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ii. If it contains exactly one event (say, 0~ ! j_, £_j_}), pair it with the 

event {p +j k + , 0+,k+, t+ t k + } from the list associated with detector + and add 
the pair to the coincident detection event list. Delete all events from the 
candidate list associated with detector + whose arrival times are so close 
that they would overlap with event similarly, delete all events from the 
candidate list associated with detector — whose arrival times are so close 
that they would overlap with event j— Delete events k + and j_ from their 
respective candidate lists. 

iii. If it contains more than one event, choose the single event j_ with greatest 
strength p_j, pair it with event k + from the candidate list associated with 
detector +, and add the pair to the coincident detection event list. Delete all 
events from the candidate list associated with detector + whose arrival times 

are so close that they would overlap with event k + ; similarly, delete all 
events from the list associated with detector — whose arrival times t-j are 
so close that they would overlap with event j_. Delete events k+ and j_ from 
the lists associated with the respective detectors. 

The result of applying this procedure to the output of the ± detectors is a set of paired 
events, one from each detector. Each member of the set involves a pair of signal amplitudes 
(in this case, equivalent to S/N) and the best estimate of the signal arrival time at each 
detector. The signal arrival times are, by construction, consistent with the incidence of a 
plane wave on the detector pair. 

It remains to combine the signal arrival times and amplitudes in each pair to determine 
a single estimate of the signal amplitude, the signal arrival time at the mid-point between 
the detector, and the radiation propagation direction. In our model problem, the natural 
estimators for the latter two quantities are 

Tj = {t+j + t-j) /2 (4.13a) 
Xj = (t +tj - t-j) /2R. (4.13b) 

The geometry of our model problem suggests no particular procedure for combining the 
separate signal amplitude estimates into an overall estimate for the network. One procedure 
that has been recommended [fTT^injI is to form the final estimate as the root mean square of 
the point estimates: 



where 



A ± := Pl (4.14b) 

{s±,s±) c± 

This prescription will consistently overestimate the amplitude of the signal. For any given 
observation of a signal with amplitude A , the estimate in the detector ± is equal to A plus 
a random variable: 
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A± = A + n ± . 



(4.15) 



If the detector noise is Gaussian then n± is Gaussian. The mean square of the point estimates 
is thus 

A 2 = A 2 + A {n + + n_) + ^(n 2 + + n 2 _). (4.16) 

The mean of A 2 , or of A, will thus be greater than A$. A similar problem will plague any at- 
tempt to form network estimates of parameters from parameters that are overdetermined by 
the network (for example, network- wide estimates of T or X from three or more detectors). 

An unbiased estimate for A is the straightforward average of the parameter values, 
which we adopt here: 

Ar-=2 + ( 4 - 17a ) 

where 

A ± , k := , } ^ : . (1.17b) 

s ±,e ± y s ±,e ± . k 

Several aspects of this procedure for detecting a signal coincident in two detectors and 
estimating the parameters characteristic of the source deserve special attention: 

a. Candidate signals. Each event in a pair identified as a coincident detection stands 
on its own as a detection in its detector at the given threshold. 

b. Estimator bias. When the noise distribution is Gaussian, the error in the estimator 
of the signal arrival time at a particular detector is also Gaussian. Consequently it might 
be thought that the error in the estimators Tj and Xj are also normal, since they arise 
from the combination of normal errors in two detectors whose noise is normal, and that the 
estimators Tj and Xj are unbiased. (This is the claim is made in |^0|,|lT|.) As discussed 



above, however, the errors in the t±j are correlated by the procedure we used to create the 
coincident detection list: for radiation whose propagation direction is nearly aligned with 
the axis between the detectors, insisting that \t + j — t-j\ be less than 2R causes us to favor 
those signals for which the errors in t + j and t_j are positively correlated. The estimator 
Xj in equation [4.13b| is thus biased to underestimate the magnitude of the propagation 
direction cosine; additionally, signals whose propagation direction cosine is large (i.e., signals 
propagated along or nearly along the axis) have larger false dismissal fractions than signals 
propagating normal to the axis. 

c. Signal strength. In the maximum likelihood test described above, signal strength 
is described by a single quantity: the S/N, which is equal to the log maximum likelihood. 
This measure of signal strength has the desired property that, as the detection threshold is 
increased, weaker signals are no longer considered to be detected before stronger signals. In 
the coincidence test, there are two different "signal-to-noise" ratios — one for each detector 
- and neither, by itself, is sufficient to determine that a signal is present. It has been 



24 



suggested flQjJllll that the "natural" signal strength for coincidence tests is the sum of the 
amplitude- squared S/N for the different detectors: in this case 

% = ? +J + ffLj- (4-18) 

This definition has the undesirable property that "stronger" signals (i.e., those with larger 
p) are not necessarily more likely to be detected than weaker ones. In particular, as the 
detection thresholds are raised at the two detectors, signals disappear from the coincident 
detection list when the weakest member of the pair {p + ,p_} falls below the threshold in the 
± detector, which is not when p 2 falls below threshold. If we want signal strength to have 
the property that, as the detection threshold is increased, weaker signals disappear from the 
detection list before stronger ones then the appropriate measure of signal strength is the 
minimum of p + and jo_ :0 

pEEmin(p + ,p_). (4.19) 

The detection rule described in this section is not the only such rule in the spirit of 
coincidence that can be defined. Many variations are possible, corresponding to the many 
ad hoc decisions that must be made, especially in identifying candidate events lists for the 
separate detectors and identifying "consistent" coincidences. The choices made here are 
among the simplest that lead to a well-defined procedure for identifying coincident events. 

3. Gaussian Noise 

To assess the relative performance of the maximum likelihood and coincidence inference 
rules we use Monte Carlo simulations to calculate the false alarm and false dismissal fractions 
a and /3 as well as the distributions of the estimators To, X and Aq for a typical signal. 

An inference rule's false alarm frequency N a is the limiting frequency of "signal detection" 
when, in fact, no signal is actually present. To determine N a as a function of the threshold 
Po we use a statistical model of the receiver noise to generate many pseudo-random instances 
of g representative of receiver noise alone. The false alarm frequency is then the average 
number of "detections" per unit time. A convenient, dimensionless representation of the 
false alarm frequency is the average number of false signals detected per sample g\k\: 

a = N a /f s , (4.20) 

where f s is the sample rate. We refer to a as the false alarm fraction; by our procedure a is 
strictly less than or equal to unity and can be regarded as the probability of a false detection 
on a per-sample basis. 

The false dismissal frequency (3 of an inference rule is the limiting frequency with which 
the rule reports that no signal is present when, in fact, a signal is present; thus, /3 is a 



4 When the several detectors in the network are not identical, or do not have coincident or isotropic 
antenna patterns, then the criteria that weaker signals are always less likely to be detected than 
stronger ones becomes more difficult to determine. 
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function of the signal (or the signal population). Another way to think about (3 is as the 
detection efficiency: 1 — (3 is the fraction of actual signals that the detection procedure will 
identify. To find (3 we generate many pseudo-random instances of receiver noise and add to 
them a specific signal. The result is many instances of g corresponding to observations of 
that source. The inference rule will conclude that no signal is present in some fraction of 
these synthetic observations: that fraction is the false dismissal fraction. 

In the case of the maximum likelihood test, a and (3 are controlled by adjusting the 
threshold p : as p is increased, a is decreased. In the case of the coincidence test described 
in section [IV D 2[ a and (3 are controlled by adjusting the two thresholds p± j0 - Since, in 
our example, the two detectors are identical, we set these equal to the same p . The false 
dismissal frequency depends on the distribution of signals in the signal population; for 
simplicity, we assume that all signals in the population have the same unknown amplitude 



A and sky location X , which are given in the first column of table |Hl . 

(More realistically the amplitude A depends inversely on the distance to the source, its 
orientation with respect to the detector, and other parameters. Corresponding to the source 
distribution in space and the other parameters is a distribution for A. Requiring that a not 
exceed a certain value sets a threshold A , regardless of this distribution. The false dismissal 
frequency (3, on the other hand, depends on this distribution in a straightforward way.) 

Figure shows 1 — (3 as a function of a for the maximum likelihood and coincidence 
detection procedures when used to detect signals of this character in our model receiver. 
In these Monte Carlo simulations we count as a false dismissal all signal identifications 
(whether by the coincidence or maximum likelihood test) where the identified start time T 
differs from the actual start time by more than the signal duration, or where the identified 
sky position X differs from actual sky position by more than the signal duration divided by 
the detector separation. This condition is necessary if "correct detections" by either rule are 
to include only those candidate events with non-zero signal power. For all a, the maximum 
likelihood test has a substantially higher detection efficiency 1— (3 than the coincidence test; 
consequently, its performance is substantially better than the coincidence test. 

The better performance of the maximum likelihood test holds independent of the actual 
signal parameters, though it is more significant for weak signals than for strong ones. It 
is, however, these weak signals — those just above threshold — that determine the overall 
efficiency of the detector. For astrophysical burst sources, which are most likely distributed 
cosmologically and hence isotropically, the S/N p is inversely proportional to the source dis- 
tance; consequently, the number of sources "brighter" than the threshold po is proportional 
to Pq 3 . Of these, a fraction e are "dimmer" than p e , where 

p e = p /(l-e) 1/3 . (4.21) 

Half of all events whose expected S/N is greater than p have S/N less than 1.26po- Thus, 
if the false dismissal fraction is large when measured for events at threshold, it will be large 
when measured over all events. 

Note also in figure |2| that the false dismissal fraction for the coincidence test asymptotes 
to a non-zero value as the false alarm frequency increases (corresponding to a lower threshold 
po): i.e., even at zero-threshold there are false dismissals. The asymptote depends on the 
signal for which the false dismissal frequency is computed: it is lower for stronger signals 
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and higher for weaker ones. The non-zero asymptote for the coincidence test originates in 
the process that selects candidate events in each detector. In the coincidence test, a false 
alarm event that occurs close in time to a real signal event can mask that real event if it 
has a higher S/N (cf. coincidence test step |TdT ). The false alarm may be sufficiently different 
than the signal event it masked that, when an attempt is made to pair it with a candidate 
event in the other detectors (cf. coincidence test step ||), the test concludes that no signal 
is present at all; alternatively, the test may identify a signal at a point in the sky or with a 
start time so different from the actual location or start time that the identification must be 
regarded clS db false alarm and not a signal detection. 

The same mechanism also operates in the maximum likelihood test (cf. maximum like- 
lihood test step U); however, that test is much less sensitive to this effect. In particular, 
noise events that would cause a masking false alarm in the coincidence test do not lead to 
a large S/N in the maximum likelihood test since there the S/N is suppressed when the 
detector-detector cross- correlation is not consistent with a real signal. 

The difference in the relative performance of the maximum likelihood and coincidence 
tests is directly traceable to the different ways in which each test requires consistency in the 
response of the different detectors: in the maximum likelihood test the relative response of 
the assembled detectors is required to be consistent with the incidence of a single wavefront 
on the receiver, while in the coincidence test the individual response of each detector is 
represented in just a few parameters and an ad hoc consistency is imposed only on the 
relative value of these parameters, none of which sample the correlated response of the several 
detectors in the receiver. 



4- Correlated Noise 

The example just given is in the context of noise uncorrelated between the two detectors. 
How do the coincidence and correlation test fare when the noise in the several receiver 
detectors is correlated? 

In the context of the coincidence test, correlated noise leads to an increase in the over- 
all false alarm frequency as noise events leading to a candidate event in one detector are 
correlated with noise events leading to a candidate in the other detector. No means of dis- 
tinguishing between these new false alarms, which arise from the noise cross-correlation, and 
correlations arising from signals is possible in a coincidence test; consequently, the only way 
that the correlated noise can be accommodated is by an increase in the thresholds applied to 
the output of each detector. This increases the false dismissal fraction, leading to an overall 
worsening of the test's performance. 

The likelihood function, on the other hand, directly accommodates correlated noise in 
a precise manner. In the context of the maximum likelihood test, noise correlated between 
the detectors means that the C[k] are no longer diagonal; however, as we have seen (cf. 
sec. |1IE|) this poses no analysis problems in either principle or practice. By construction, 



then, the maximum likelihood test distinguishes inter-detector correlations whose spectrum 
is characteristics of a real signal from inter-detector correlations that are characteristic of 
correlated detector noise. Consequently, when noise is correlated between the receiver's 
detectors we expect the maximum likelihood test to perform still better than the coincidence 
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test. 



E. Non-Gaussian Noise 



Equation |3.18| describes the likelihood function only when the receiver noise is Gaus- 
sian. The noise in a real detector will be, at some level, non-Gaussian and non-stationary: 
some fundamental contributions to the noise may be intrinsically non-Gaussian, some con- 
tributions may be intrinsically Gaussian but appear non-Gaussian in the output owing to 
non-linearities in the receiver's response, and some contributions will reflect the environment 
that the detector finds itself in.Q We have already shown that, lacking detailed knowledge of 
the higher order moments of the detector noise distribution, a Normal distribution is the the 
best approximation to a noise distribution whose mean and variance are known (cf. |111 A| ). 
We refer to this as the Gaussian-approximation likelihood function. 

We expect that the maximum likelihood test will still outperform the coincidence test, 



since it is still the case that the correlation analysis described in section |IV D 1| is sensitive 
to the coherent response of a receiver's component detectors to a real signal in ways that 
the coincidence analysis described in section |IVD 2| is not. Coincidence tests misidentify 



as signals coincident non-Gaussian noise events as readily as they do coincident Gaussian 
noise events, while a Gaussian-approximation likelihood test reject non-Gaussian events that 
are inconsistent with an incident plane gravitational wave as easily as they do inconsistent 
Gaussian events. Thus, we expect in general that the detection efficiency for fixe false 
alarm fraction will be greater for a test based on the Gaussian-approximation likelihood test 
statistic than for a coincidence test based on the individual detector responses. 

To demonstrate this point, we simulate non-Gaussian noise according to two models 
- one strongly leptokurtic and one strongly platykurtic — and apply the coincidence and 
Gaussian-approximation maximum likelihood tests described in sections |IV D 1| and [IV D 2 



to calculate the relationship between detection efficiency and false alarm fraction for a fixed 
signal. A convenient model for a stationary non-Gaussian noise process is the mixture 
Gaussian. A mixture Gaussian distribution has the form 

P (x\{ Pi , <n,i = l... N}) = f ft eXP [{X -J^l 2 ^ (4.22a) 

N 

J2p i = 1 ( 4 - 22b ) 

i=l 

Pi > 0. (4.22c) 

By appropriate choice of the constants Pi, /ij and <7j a mixture Gaussian can approximate 
any uncorrelated noise distribution through its first 2N moments. 



5 We distinguish between noise transients, which are generally relatively short bursts, and non- 
stationarity, which we use to mean adiabatic changes in the statistical character. 
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The two distributions we model here are drawn from the mixture Gaussian distribu- 



tions described in table |IV|. The corresponding probability distribution functions are shown 
graphically in figure |3|. Note that each is strongly non-Gaussian, though in different ways. 

Figures ^ and [5] show the detection efficiency as a function of the false alarm frequency for 
the coincidence and Gaussian-approximation maximum likelihood tests for the leptokurtic 
distribution and platykurtic distributions, respectively The detection efficiency and false 
alarm fractions were determined by Monte Carlo simulations. The signal parameters used 
in the detection efficiency simulations are given in the second and third columns of table 
ITJ. The conclusion reached earlier — that the maximum likelihood test is superior to the 
coincidence test — is not sensitive to the approximation of Gaussian noise. There is no 
qualitative difference between figures |4], [5|, and |2], which summarize the relative characteris- 
tics of these two tests when the noise is strongly non-Gaussian or Gaussian. In all cases the 
detection efficiency of the (Gaussian- approximation) maximum likelihood test is less than 
that of the coincidence test at the same false alarm fraction. The principal reason for this 
superior performance is the same here as in the case of Gaussian noise: the response of two 
or more detectors to incident radiation is correlated, and the Gaussian-approximation max- 
imum likelihood test is sensitive to the expected inter-detector cross-correlations, reducing 
the S/N when the correlations are not consistent with waves from the same source. 



V. CONCLUSIONS 

The output of several gravitational wave detectors can be combined, in a form of aperture 
synthesis, to form a single, more sensitive gravitational wave detector. Here we describe 
such an analysis, based on the likelihood function, appropriate to the detection of a burst 
gravitational wave source, of known waveform, in a network of gravitational wave detectors. 
This likelihood analysis of the joint output of several detectors leads to the optimal matched 
filter for the output of the multi-detector network. 

The analysis presented here stands in contrast to "coincidence" analyses, where the 
output of each detector is studied separately to arrive at a list of events, which are then 
compared between the detectors to determine if there are any coincidences, which may be 
taken to be evidence for gravitational waves. 

The critical difference between these two analyses is that the likelihood is sensitive to the 
coherent response of the detector network to the incident signal. This leads the likelihood 
based analysis to have greater discriminating power than a coincidence analysis, as shown 
by a greater detection efficiency to false alarm frequency ratio. 

The importance of inter-detector correlations is clearly important when looking for a 
signal; however, it is also important when the noise in two or more detectors is correlated. 
Such correlations are naturally accommodated in the optimal filter developed here; however, 
they cannot be naturally accommodated in a coincidence analysis which, by its very nature, 
ignores inter-detector correlations. 

The likelihood function derived here begins with an assumption that the detector noise 
is Gaussian-stationary; however, the results obtained are much more general. We show 
that treating non-Gaussian as if it were Gaussian is, in a very well-defined sense, the most 
appropriate choice if the only available characterization of the noise is through its mean 
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and correlation function. While it is sometimes claimed [|T^] that coincidence is the only 
reasonable test when the noise is non-Gaussian, we show in a series of numerical simulations 
that the even when the noise is strongly non-Gaussian the likelihood test, treating the noise 
as if it were Gaussian, outperforms the coincidence test as measured by the ratio of detection 
efficiency to false alarm fraction. 

A naive estimate of the computational cost of computing the matched filter for a network 
of detectors might suggest that the cost is proportional to the cube of the length of the time 
series and the number of detectors. If the calculation is properly organized, however, the 
cost is seen to be strictly proportional to the duration of the signal being sought and no 
more than the square of the number of detectors in the network. 

The work described here can be considered aperture synthesis specialized to the problem 
of searching for signals of finite duration whose waveform is known. The problem of searching 
for stochastic signals remains to be studied. True aperture synthesis — searching for fringes 
in the interfered output of several gravitational wave detectors — may well be the most 
sensitive means of searching for the truly unanticipated source and is a particularly promising 
direction for future research. 
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TABLES 



Functions Sequences 

Time Domain Frequency Domain Time Domain Frequency Domain 
Scalar a(t) a(/) a[k] a[k] 

Vector a(t) a(/) a[k] a[k] 

Matrix A(t) A(f) A[k] A[k] 

TABLE I. Nomenclature. 



Detector separation (2R) 25//o 
Sampling frequency 4/q 
Detector noise PSD (S±) l//o 

Observation duration (To) 100//o 

TABLE II. Parameters describing the toy receiver and signal duration used for examples in 
this paper. All parameters are given in units of fo, the reciprocal signal duration. 



Gaussian Leptokurtic Platykurtic 



A 


2.5 


3.5 


3.0 


X 


0.0 


0.0 


0.0 



TABLE III. Parameters describing the signal used in the example calculation of the detection 
efficiency for the Gaussian, platykurtic and leptokurtic noise examples. 





Leptokurtic 


Platykurtic 


p 


(1/2,1/4,1/4) 


(61/192,131/384, 131/384) 


/' 


(0,2,-2) 


(0,2,-2) 


a 


(L2,2) 


(i,M) 


Mean: 








Std. Dev.: 


2.1213 


1.9311 



TABLE IV. Parameters describing the two mixture Gaussian models used to explore the max- 
imum likelihood and coincidence test performance non-Gaussian noise in section [VE| . Also shown 



are the mean and standard deviation of the distributions. The corresponding PDFs are shown 
graphically in figure |3| 
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FIGURES 




FIG. 1. To illustrate the effectiveness of a correlation analysis compared to a coincidence anal- 
ysis we apply both to a model problem involving two identical detectors of isotropic response. 
The detectors are denoted + and — . Their relative separation and the parameters describing the 
direction of an incident plane wave signal are shown in this figure. 
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FIG. 2. The false alarm frequency a vs. the detection efficiency (1 — 0) for the maximum 
likelihood and coincidence tests in the presence of Gaussian noise. The parameters describing the 
signal, to which the detection efficiency refers, are given in the first column of table ID. Note that 
the performance of the maximum likelihood test is everywhere superior to the performance of the 
coincidence test. The degree of superiority will vary with signal strength; however, the relativity 
performance of the two tests will not. The superior performance of the likelihood based test is 
attributable to the way in which the maximum likelihood test internalizes the detector-detector 
correlations that are present when a real signal interacts with the receiver. For more details see 
section IIVD 3. 
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FIG. 3. The probability distribution functions for the leptokurtic and platykurtic non-Gaussian 
noise models used to test the relative performance of the coincidence and Gaussian-approximation 



likelihood test. For more details see table TV\ and section IV E 
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FIG. 4. The false alarm fraction a vs. the detection efficiency (1 — 0) for the Gaus- 
sian-approximation likelihood and coincidence tests in the presence of strongly leptokurtic 
non-Gaussian noise. The noise is described by the mixture Gaussian model whose parameters 
are given in the first column of table IV and the signal used for calculating the detection efficiency 



is described in table [HI]. Figure |3| shows the noise PDF graphically. Note that, even though when 
noise is substantially non-Gaussian, the Gaussian-approximation likelihood test has significantly 
better performance then the coincidence test. The superiority is attributable to the fact that real 
signals are correlated between the detectors, and the Gaussian approximation likelihood test, even 
when the noise is not Gaussian, is still sensitive to those correlations. For more discussion see 



section IV E 
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FIG. 5. The false alarm fraction a vs. the detection efficiency (1 — (3) for the Gaus- 
sian-approximation likelihood and coincidence tests in the presence of strongly platykurtic 
non-Gaussian noise. The noise is described by the mixture Gaussian model whose parameters 
are given in table IV and whose PDF is shown graphically in figure |3[ For more discussion see 



section IV E 
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